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Abstract Body 


Background / Context: 

Difference-in-differences (DID) strategies are particularly useful for evaluating policy 
effects in natural experiments in which, for example, a policy affects some schools and students 
but not others. However, the standard DID method may produce biased estimation of the policy 
effect if the confounding effect of concurrent events varies by individual characteristics and if 
the experimental group and the comparison group differ in such characteristics (Meyer, 1995). 
Three alternative DID approaches have been proposed in the literature to overcome this problem: 
(1) linear covariance adjusted DID models (e.g., Barnow, Cain, & Goldberger, 1980; Card & 
Kruger, 1993; Dynarski, 2003; Fitzpatrick, 2008); (2) propensity score -based DID analyses 
(Abadie, 2005; Blundell et al, 2004; Cerda, et al, 2012; Heckman, Ichimura, Smith, & Todd, 
1998; Heckman, Ichimura, & Todd, 1997); and (3) nonlinear changes-in-changes (CIC) models 
(Athey & Imbens, 2006). We propose a fourth alternative DID approach that utilizes prognostic 
scores (Authors, 2012). Each of these methods invokes a different set of identification 
assumptions that has implications for their relative performance. 

Purpose / Objective / Research Question / Focus of Study: 

This paper reviews the existing alternative DID methods and compares their 
identification assumptions with those of the new prognostic score -based DID strategy. 
Generating data that approximate typical education accountability data, we evaluate the relative 
strengths and limitations of the new DID strategy through a series of Monte Carlo simulations. 

Significance / Novelty of study: 

This study contributes to the literature by comparing alternative DID methods. We 
hypothesize that, in comparison with other existing DID methods, the new prognostic score- 
based DID strategy invokes assumptions that are relatively more plausible and is more likely to 
produce unbiased and efficient estimates of policy effects. 


Statistical, Measurement, or Econometric Model: 

Notation and causal estimand. Let Tj denote the outcome of individual i. Let fij = 1,0 
denote whether the individual is in the experiment group affected by the policy or in the 
comparison group unaffected by the policy, respectively. Let Tj = 1,0 denote whether the 
individual was observed in the post-policy year or in the pre-policy year, respectively. Let Aj 
denote a vector of covariates measuring individual characteristics that are not caused by the 

policy. Let denote the potential outcome that individual i would display if in the 

experimental group and counterfactually having exposure to the policy in the pre-policy year; let 


^iGiTo denote the individual’s potential outcome in the pre-policy year in the absence of the 
policy. Suppose we are interested in estimating the average policy effect for individuals in the 
experiment group in the pre-policy year (i.e., the treatment effect on the untreated in the 
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experimental group), the causal estimand is = 

Standard DID method. The standard DID estimator is {^[TIG = 1, T = 1] — 

£■[^1^ — 1,T — 0]} — {^[TIG — 0,T = 1] — E[Y\G — 0,T — 0]} and can be obtained through 
analyzing a linear model Yi = ag + a^Ti -f PqGi -f + ej. The average confounding effect 


of the concurrent events for the experimental group is hgi 
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E YqWq\G = 1,T = 0 and that for the comparison group is hgo == ^[^co ri 1^ — 0,T = l] ~ 

r (i) 1 

E = 0,T = 0 . The standard DID method removes the confounding effect under the 

assumption = b^Q. This assumption will be violated and therefore the DID results will be 
biased, for example, if the confounding effect varies by individual characteristics and if the 
experimental group and the comparison group differ in such characteristics. 

Linear covariance adjusted DID method. Prior research has typically employed a DID 
model with linear covariance adjustment for observed pretreatment characteristics, Yi — aQ + 
a{Ti + PqGi + PxGiTi + AXi + 6j. Here is an unbiased estimate of the causal estimand 
under the key assumption that the confounding effect of concurrent events for the experimental 
group is the same as that for the comparison group within levels of pretreatment covariates 
X = X, that is, bg^\x = ^go\x- Moreover, this method requires correct specification of the 
functional form of the outcome model and relies heavily on linear extrapolation. 

Propensity score-based DID method. Heckman and colleagues (1997, 1998) used 
propensity score matching to identify the common support in the observed covariates X and to 
equate the distribution of X between the experimental group and the comparison group. Abadie 
(2005) proposed using propensity score-based inverse-probability-of-treatment weighting 
(IPTW) to equate the distribution ofXbetween the two groups. Let 0(A) = pr(G = 1|A) be the 
propensity score representing the conditional probability that an individual would be assigned to 

the experimental group given X. Propensity score matching and IPTW assume that Y^^f. and Yg^^ j 
are independent of G given the propensity score 0(A) in year t for t = 0,1. This assumption 

requires that £■ I G = 1,T = 1, A = xj = e(y^^\.j^\G = 0,T = 1,A = x^ and 

e(Yq^to\G = 1, T = 0, a = x^ = e(YqI\q\G = 0,T = 0,A = x^. In contrast, the assumption 
^Gi\x = ^Go\x holds even when the average potential outcomes of the experimental group and 
the comparison group are unequal in a given year. 

Nonlinear CIC Adjustment. When the experimental group and the comparison group are 
different in unobserved pretreatment characteristics U, to estimate the treatment effect on the 
treated, Athey and Imbens (2006) proposed a nonlinear CIC model estimating the entire 
distribution of the counterfactual outcome for the experimental group based on the observed 
change in the outcome distribution of the comparison group. Key assumptions of the CIC 
method include (a) A single index model in year t represented by h(u, t) and common change in 
production function from h(u, 0) to h(u, 1) regardless group membership; (b) The production 
function h(u, t) is strictly increasing in u given t; (c) Time invariance in the distribution of U 
within groups; (d) The support for the distribution of U in the experimental group is a subset of 
that in the comparison group. 

Prognostic score-based DID method. This new method aims to equate the predicted 
amount of confounding of concurrent events across the pre -policy experimental group, the post- 
policy experimental group, the pre-policy comparison group, and the post-policy comparison 
group within subclasses of units. A pair of prognostic scores (Hansen, 2008) per unit, denoted by 
00 and 01 , represent the predicted pre -policy outcome and the predicted post-policy outcome 
under the comparison condition in the absence of policy change. DID analysis within each 
homogeneous subpopulation defined by 0 o and 0 i is expected to generate an unbiased estimate 
of the policy effect of interest under the identification assumption that = bga^^^ ^p^ 

where = 1,T = l,0o,0i] - E = 1,T = O,0o,0i] and 
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hoirPo.rpi = ^[^GOTil^ = 0,7= “ ^’f^^Garol^ = 0,7 = O,t/;o,t/^i]. This assumption 

implies that (a) xpt = /(x, t), for t = 0,1 defines the function for the counterfactual outcome 
under the comparison condition at time t regardless of one’s actual treatment group membership, 
and that (b) the support for the observed covariates X in the comparison school, denoted by Xq, 
encompasses the support in the experimental school, denoted by . 

Usefulness / Applicability of Method: 

Comparing the identification assumptions across the above DID methods, we highlight 
some important strengths of the prognostic score-based DID method. These include: 

(1) Unlike the linear covariance adjusted DID models, the prognostic score-based DID 
strategy does not assume that, conditional on the observed pretreatment covariates X, the average 
treatment effect for the untreated in the experimental group is the same as that for the entire 
population should all units have been untreated. This advantage is shared by the propensity 
score-based DID methods and the nonlinear CIC models. 

(2) Unlike linear covariance adjusted DID, prognostic score-based DID does not assume 
invariant x-y relationships across time. 

(3) Prognostic score-based DID assumes that, if an experimental unit had 
counterfactually been assigned to the comparison condition in a given time period, the x-y 
relationship would have been the same as that of the comparison units with the same observed 
pretreatment characteristics. In contrast, the nonlinear CIC method requires applying a single 
production function to the outcomes of both the experimental group and the comparison group in 
a given time period. The latter seems implausible because a change in the distribution of U will 
likely change the u-y relationship. 

(4) Unlike the nonlinear CIC models, the prognostic score-based DID models do not 
require strict monotonicity. Nor do they require that the outcome contain no measurement error. 
These advantages are shared by linear covariance adjusted DID and propensity score -based DID. 

(5) While the nonlinear CIC models assume time invariance within the experimental 
group and the comparison group, this is not a requirement for all the other DID methods 
including prognostic score-based DID. 

(6) A major difference between propensity score-based DID and prognostic score-based 
DID is that the latter does not require equating the pretreatment composition of the experimental 
group and the comparison group. The same advantage is shared by linear covariance adjusted 
DID and nonlinear CIC. 

(7) Similar to propensity score-based DID, prognostic score-based DID emphasizes and 
verifies the common support between the experimental group and the comparison group with 
regard to observed covariates X, which effectively avoids unwarranted extrapolation. However, 
propensity score -based DID may suffer if some pretreatment covariates unrelated to the outcome 
lead to a shrinkage in the common support. The nonlinear CIC models make a similar 
assumption with regard to the unobserved covariates U that cannot be empirically verified. 

(8) Similar to the propensity score-based DID, the prognostic score -based DID greatly 
reduces the dimensionality of covariates for adjustment, which is a major advantage over the 
linear covariance adjusted DID. 

(9) Both prognostic score-based DID and DID with propensity score -based matching 
enable researchers to detect heterogeneity in the confounding effects of concurrent events as well 
as in the policy effect. 

We also emphasize some potential limitations of the prognostic score-based DID method: 
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(1) A unique feature of the nonlinear CIC models is that it does not require explieit 
speeifieation of the outeome model. All other DID strategies require explieit modeling that 
involves funetional forms. The prognostie seore-based DID models are no exeeption. To 
alleviate the impaet of misspeeifying the funetional form of the model, researehers may employ 
various semi-parametrie or nonparametrie approaehes, a topie beyond the seope of this paper. 

(2) Both linear eovarianee-adjusted DID and propensity seore-based DID would suffer if 
the experimental group and the eonditional group differ in the distribution of unobserables U and 
if the amount of eonfounding of eoneurrent events is a funetion of U. The same type of 
unobserables U, if independent of the observed eovariates X, would also bias the prognostie 
seore-based DID estimate of the poliey effeet. However, if the eonfounding does not depend on 
U or if the distribution of U is the same between the experimental group and the eomparison 
group eonditioning on A, then omitting U would not introduee bias. The speeial implieation for 
the prognostie seore-based DID method is that the estimated poliey effeet eould possibly be 
unbiased even when the prognostie seore models have low predietive power. 

Research Design: 

We examine the following researeh questions in the simulation study: 

(1) In the best possible world in whieh all the assumptions required by the standard DID 
method hold, how does the prognostie seore-based DID result eompare with those of other DID 
methods in terms of bias reduetion, preeision, and mean square error? 

(2) If the pretreatment eomposition differs either between the experimental group and the 
eomparison group or between the pre-poliey and the post-poliey years, or both, how does the 
prognostie seore-based DID result eompare with those of other DID methods? 

(3) If the eovariate-outeome relationship eonditional on other observed eovariates 
ehanges over time within eaeh group yet remains the same aeross the experimental group and the 
eomparison group at a given time under the same poliey, how does the prognostic score-based 
DID result compare with those of other DID methods? 

(4) How severe are the consequences when the linear covariance adjusted-DID model, 
the propensity score model, and the prognostic score model are misspecified in their respective 
functional forms? 

(5) When the confounding effect of concurrent events is a function of U and when the 
distribution of U differs between the experimental group and the comparison group and, 
additionally, when the distribution of U either remains the same or changes over time, how 
sensitive is the prognostic score -based DID result to the omission of U when compared with 
other DID methods? 

Findings / Results: 

Results are forthcoming. 

Conclusions: 

Empirical findings obtained from simulation studies will inform our understanding of the 
relative performance of the new prognostic score -based DID method in comparison with the 
existing DID methods under a wide array of scenarios often plausible in educational policy 
evaluations with accountability data. The results will contribute to the statistics and econometrics 
literature and will provide practical guidance for applied researchers. 
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